Search CORE

6 research outputs found

Deep Learning based 3D Segmentation: A Survey

Author: Fu Qiang
He Yong
Liu Xiaoyan
Mian Ajmal
Sun Wei
Wang Yaonan
Yang Zhengeng
Yu Hongshan
Zou Yanmei
Publication venue
Publication date: 09/03/2021
Field of study

3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure

arXiv.org e-Print Archive

First Place Solution to the CVPR'2023 AQTC Challenge: A Function-Interaction Centric Approach with Spatiotemporal Visual-Language Alignment

Author: Chen Chen
Chen Tom Tongjia
Li Ming
Li Zechuan
Miao Wei
Sun Wei
Wang Jingwen
Yang Zhengeng
Yu Hongshan
Publication venue
Publication date: 23/06/2023
Field of study

Affordance-Centric Question-driven Task Completion (AQTC) has been proposed to acquire knowledge from videos to furnish users with comprehensive and systematic instructions. However, existing methods have hitherto neglected the necessity of aligning spatiotemporal visual and linguistic signals, as well as the crucial interactional information between humans and objects. To tackle these limitations, we propose to combine large-scale pre-trained vision-language and video-language models, which serve to contribute stable and reliable multimodal data and facilitate effective spatiotemporal visual-textual alignment. Additionally, a novel hand-object-interaction (HOI) aggregation module is proposed which aids in capturing human-object interaction information, thereby further augmenting the capacity to understand the presented scenario. Our method achieved first place in the CVPR'2023 AQTC Challenge, with a Recall@1 score of 78.7\%. The code is available at https://github.com/tomchen-ctj/CVPR23-LOVEU-AQTC.Comment: Winner of CVPR2023 Long-form Video Understanding and Generation Challenge (Track 3

arXiv.org e-Print Archive

Robust Robot Pose Estimation for Challenging Scenes With an RGB-D Camera

Author: Hongshan Yu
Lei Tan
Mingui Sun
Qiang Fu
Wei Sun
Zhengeng Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Human-Mimetic Estimation of Food Volume from a Single-View RGB Image Using an AI System

Author: Ding Yuan
Hong Zhang
Hongshan Yu
Mingui Sun
Qi Xu
Shunxin Cao
Wenyan Jia
Zhengeng Yang
Zhi-Hong Mao
Publication venue: 'MDPI AG'
Publication date: 01/06/2021
Field of study

It is well known that many chronic diseases are associated with unhealthy diet. Although improving diet is critical, adopting a healthy diet is difficult despite its benefits being well understood. Technology is needed to allow an assessment of dietary intake accurately and easily in real-world settings so that effective intervention to manage being overweight, obesity, and related chronic diseases can be developed. In recent years, new wearable imaging and computational technologies have emerged. These technologies are capable of performing objective and passive dietary assessments with a much simplified procedure than traditional questionnaires. However, a critical task is required to estimate the portion size (in this case, the food volume) from a digital image. Currently, this task is very challenging because the volumetric information in the two-dimensional images is incomplete, and the estimation involves a great deal of imagination, beyond the capacity of the traditional image processing algorithms. In this work, we present a novel Artificial Intelligent (AI) system to mimic the thinking of dietitians who use a set of common objects as gauges (e.g., a teaspoon, a golf ball, a cup, and so on) to estimate the portion size. Specifically, our human-mimetic system “mentally” gauges the volume of food using a set of internal reference volumes that have been learned previously. At the output, our system produces a vector of probabilities of the food with respect to the internal reference volumes. The estimation is then completed by an “intelligent guess”, implemented by an inner product between the probability vector and the reference volume vector. Our experiments using both virtual and real food datasets have shown accurate volume estimation results

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Methods and datasets on semantic segmentation: A review

Author: Sun Mingui
Sun Wei
Tan Lei
Tang YD(唐延东)
Wang YN(王耀南)
Yang Zhengeng
Yu HS(余洪山)
Publication venue
Publication date: 01/01/2018
Field of study

Semantic segmentation, also called scene labeling, refers to the process of assigning a semantic label (e.g. car, people, and road) to each pixel of an image. It is an essential data processing step for robots and other unmanned systems to understand the surrounding scene. Despite decades of efforts, semantic segmentation is still a very challenging task due to large variations in natural scenes. In this paper, we provide a systematic review of recent advances in this field. In particular, three categories of methods are reviewed and compared, including those based on hand-engineered features, learned features and weakly supervised learning. In addition, we describe a number of popular datasets aiming for facilitating the development of new segmentation algorithms. In order to demonstrate the advantages and disadvantages of different semantic segmentation models, we conduct a series of comparisons between them. Deep discussions about the comparisons are also provided. Finally, this review is concluded by discussing future directions and challenges in this important field of research. (c) 2018 Elsevier B.V. All rights reserved

Shenyang Institute of Automation,Chinese Academy Of Sciences